home *** CD-ROM | disk | FTP | other *** search
Text File | 1993-05-24 | 38.7 KB | 1,181 lines |
- /*****************************************************************************
- * File: search.c
- *
- * Author: Rhett "Jonzy" Jones
- * jonzy@cc.utah.edu
- *
- * Date: March 6, 1993
- *
- * Modified: March 15, 1993, by Rhett "Jonzy" Jones.
- * Added the ability to handle the -HUP interrupt.
- *
- * March 16, 1993, by Rhett "Jonzy" Jones.
- * PrintPositions() will only return the collected items if
- * the number of these items is less 1024. Modified
- * PostPositions() to watch for the case where A and B and C
- * yeilds (A and B) = NIL, thus this result anded with C will
- * no longer print the entire list C. Oversight in my part.
- *
- * March 27, 1993, by Rhett "Jonzy" Jones.
- * Moved the definition of PORT2USE to the Makefile.
- *
- * March 28, 1993, by Rhett "Jonzy" Jones.
- * Added sun support with the inclusion of:
- * #ifdef sun include <unistd.h>.
- *
- * March 31, 1993, by Mic Kaczmarczik: mic@bongo.cc.utexas.edu
- * Added support for an optional -p port flag when running as a
- * search engine . Added the variable 'searchPort as a parameter
- * in DoSearch().
- *
- * April 4, 1993, by Rhett "Jonzy" Jones.
- * Added the variables elements2do, indexName, hashName, ifp, and
- * hfp, the routines GetPositions(), WriteHashTables(), and
- * MakeHashTables(),
- * to support giving the user some feed back as to the status of
- * the program when building the index tables and the use of a
- * secondary hash table to limit the amount of memory used, and
- * modified GetDisplayString(), CreateWordsTree()
- *
- * May 5, 1993, by Rhett "Jonzy" Jones.
- * Modified PostPositions(), and added GetAllPositions()
- * to support partial word searches.
- * Added the #define MAXITEMS2RETURN.
- *
- * May 6, 1993, by Rhett "Jonzy" Jones.
- * Modified LogMessage() to return if neither logFile or debug is
- * true. This was done for efficiency purposes only.
- * Changed the loop counter in CleanUp() from an int to a long.
- * Fixed a bug that caused a core dump if received the SIGHUP
- * signal more than once. This was rectified by making
- * items[l].positions = NIL in CreateElements(). Oops!
- * Fixed a potential bug in CreateElements() by making sure
- * numElements
- * gets a long value instead of a short.
- * Added the #define SHOW_ITEMS_DURING_DEBUG to ease in debugging.
- * Did some code cleaning up.
- *
- * May 8, 1993, by Rhett "Jonzy" Jones.
- * Added the routine PrintList(), and defined BOOLOP_DEBUG to
- * assist in fixed a problem in PostPositions(), which gave
- * incorrect results of "a* b* not c*". Thank you
- * doyle@liberty.uc.wlu.edu for bringing this to my attention.
- *
- *
- * May 10, 1993, by Rhett "Jonzy" Jones.
- * Modified PostPositions() to correct a problem of a search on
- * "a and b and c", where if "a and b" evaluated to nothing, the
- * result of anding this result with c produced c. Example:
- * "a and b and c" was the same as "a and b or c" if "a and b"
- * evaluated to nothing. Thanks doyle@liberty.uc.wlu.edu for
- * bringing this to my attention.
- * Changed the format of the logfile by altering the string
- * "jughead-Search ->" to now be "jughead(port#) ->", where
- * port# is the port jughead was started up with. This change
- * was at the request of doyle@liberty.uc.wlu.edu to indentify
- * which jughead daemon did the logging when more than one
- * jughead is logging to the same log file. Added the variable
- * 'port2log' to support this feature.
- *
- * May 12, 1993, by Rhett "Jonzy" Jones.
- * doyle@liberty.uc.wlu.edu reported that "a and b" returns
- * 'b' if 'a' did not exist, so ... back to the drawing board to
- * fix this one.
- *
- * May 14, 1993, by Rhett "Jonzy" Jones.
- * doyle@liberty.uc.wlu.edu reported that "a and b" returns
- * 'a' if 'b' did not exist. I believe things are fixed now.
- *
- * May 20, 1993, by Rhett "Jonzy" Jones.
- * Added the inclusion of sys/types.h, sys/socket.h, time.h,
- * and pwd.h. Gave support for the -u username to change the
- * process uid when running jughead as a search engine.
- * Added the routine VerifyDataBaseName() to allow verification
- * of the database while allowing for the database to reside in
- * a directory other than jughead. Thank you rzakon@mitre.org
- * for bringing this to my attention.
- *
- * May 22, 1993, by Rhett "Jonzy" Jones.
- * Added DEFAULTBOOLOP, which is defined in the Makefile to
- * allow for an easy change of the default boolean operator
- * to use when no operator seperates word to search for.
- *
- * May 23, 1993, by Rhett "Jonzy" Jones.
- * Gave support for special commands. For information on these
- * commands consult either "searchCmnds.c", the man page, or
- * "About.jughead".
- *
- * May 23, 1993, by Rhett "Jonzy" Jones.
- * At the request of doylej@liberty.uc.wlu.edu modified
- * PrintPositions() to return a link when there are more than
- * MAXITEMS2RETURN items. The link has the form:
- * Name=xxx items found. Please consolidate your request
- * Type=1
- * Port=the port number jughead was started under
- * Path=?all [the requested search]
- * Host=THEHOST as defined in the Makefile
- *
- *
- * May 24, 1993, by Rhett "Jonzy" Jones.
- * At the request of doylej@liberty.uc.wlu.edu gave support
- * for the "?range=start-stop what" special command.
- *
- * Description: Either builds an index into a datafile, where the datafile
- * contains lines of gopher menu items (via jughead), or performs
- * boolean searches on words from the display string in the
- * datafile as read from the resultant index file.
- *
- * When building the index we read from the datafile and for
- * each line we acquire the current line position in the file via
- * ftell(), and extract the display string excluding the first
- * character which is the item type, and break the display string
- * into words. We then dump the word followed by a tab and the
- * position to a temporary file. We then read from the temporary
- * file and build a binary tree containing words and a list of
- * postions, which is the line the word came from. No word is
- * duplicated in the tree. And finaly dump the binary tree to
- * a file with the following format:
- * dataFileNameIndexWasBuiltFrom numberOfnodes
- * lenWord0 word0 0 1 3
- * lenWord1 word1 2 5
- * ...
- * lenThisWord word_numberOfnodes-1 m n
- *
- * As of April 4, 1993, a hash table and an index table get built
- * where the hash table as the following format:
- * dataFileNameIndexWasBuiltFrom numberOfnodes
- * lenWord0 word0 position_in_index_table
- * lenWord1 word1 position_in_index_table
- * ...
- * lenThisWord word_numberOfnodes-1 position_in_index_table
- *
- * and the index table has the following format:
- * number_of_positions position1...position-number_of_positions
- *
- * where: dataFileNameIndexWasBuiltFrom is the name of the data
- * file this index was built from, numberOfnodes is the number of
- * nodes in the tree, lenWordX is the number of characters in wordX
- * including the terminating null, wordX is the word followed by
- * the position in the data file the word came from.
- *
- * When reading from the index file and doing boolean operations,
- * we read the index into memory, and acquire the string the user
- * wants to do a search on. This string is then broken into words,
- * using the same mechanisim as breaking the display string into
- * words. If a single word is found we print all lines from the
- * datafile in which the word exists in the display string. If two
- * words are found it is taken to be word1 AND word2. The boolean
- * operations currently supported are AND, NOT, OR. All boolean
- * operations are evaluated left to right.
- *
- * Routines: void LogMessage(int sockfd,char *message);
- * char *GetDisplayString(FILE *fp,long *pos);
- * int CreateWordsTree(char *fileName);
- * static ListType *GetPositions(long index);
- * static void WriteHashTables(TreeType *node);
- * void MakeHashTables(char *fileName,TreeType *root);
- * short ReservedWord(char *word);
- * short ParseSearchString(char *string);
- * static void PrintPositions(ListType *node);
- * static void PrintList(ListType *l);
- * ListType *DoOperation(short op,ListType *l1,
- * ListType *l2);
- * void LogRequest(int sockfd);
- * static ListType *GetAllPositions(long index,char *what2Find,
- * char *asterik,
- * size_t asterikPos,int sockfd);
- * void PostPositions(char *what2find);
- * static int VerifyDataBaseName(char *fName,char *dName);
- * short CreateElements(char *fileName);
- * void CleanUp(void);
- * int HangUpSignal(void);
- * void DoSearch(char *indexTable,char *logFile,
- * int searchPort);
- *
- * Bugs: No known bugs.
- *
- * Copyright: Copyright 1993, University of Utah Computer Center.
- * This source may be freely distributed as long as this copyright
- * notice remains intact, and is in no way used for any monetary
- * gain, by any institution, business, person, or persons.
- *
- ****************************************************************************/
-
- #include <pwd.h>
- #include <setjmp.h>
- #include <signal.h>
- #include <stdio.h>
- #include <sys/types.h>
- #include <sys/socket.h>
- #include <time.h>
- #ifdef NEXT
- # include <libc.h>
- #else
- # include <stdlib.h>
- #endif
- #include <string.h>
- #include <fcntl.h>
- #include <netinet/in.h>
- #ifdef sun
- # include <unistd.h>
- #endif
-
- #include "tree.h"
- #include "utils.h"
-
- /* Uncomment this if you want to look at the items after they are built. */
- /* #define SHOW_ITEMS_DURING_DEBUG */
- /* Uncomment this to assist in debugging the boolean operations. */
- /* #define BOOLOP_DEBUG */
-
- #define MAXITEMS2RETURN 1024 /* The maximum items to return at once. */
-
- #define NOOP 0 /* The NOOP operation. */
- #define AND 1 /* The AND operation. */
- #define OR 2 /* The OR operation. */
- #define NOT 3 /* The NOT operation. */
- #define MAXWRDSNCMNDS 20 /* The allowable words and commands. */
-
- #define HASHEXT ".ih" /* The hash table extention. */
- #define INDXEXT ".ix" /* The index table extention. */
- #define NINEBACKSPACES "\b\b\b\b\b\b\b\b\b"
-
- /* These are the characters that denote a separator for words. */
- #define DELIMITERS " \t\n\f\r !\"#$%&\'()+,-./:;<=>?@[\\]^_`{|}~"
-
- Element *items = (Element *)NIL; /* Array of words and file postions. */
- long numElements, /* Number of elements in 'items'. */
- elements2do; /* Number of elements left to do. */
- char dataFileName[512], /* The name of the data file. */
- indexName[512], /* Name of the index file. */
- hashName[512], /* Name of the hast file. */
- *logFile = (char *)NIL, /* Name of the file to log to. */
- *wrdsNcmds[MAXWRDSNCMNDS + 1]; /* Array of words and commands. */
- short numCommands; /* The number of words and commands. */
- int port2log; /* The port being used for the log file. */
- jmp_buf hupbuf; /* The place to go if -HUP encountered. */
- FILE *ifp = (FILE *)NIL, /* Pointer to the index file. */
- *hfp = (FILE *)NIL; /* Pointer to the hash file. */
-
- /* The following are defined in "jughead.c". */
- extern char *userName; /* Name of the user to run jughead under. */
- extern int debug, /* Are we debugging? */
- time2process; /* Do we calculate the time for a certain run? */
- extern time_t startTime; /* The time a run was started, for use with 'time2process'. */
-
- extern long lineNumber; /* Defined in "dirTree.c". */
-
- extern char *SpecialCommand(); /* Defined in "searchCmnds.c". */
- /*****************************************************************************
- * LogMessage logs the message 'message' to the end of the log file 'logFile'.
- * The logging is the same as that found in gopherd.c from the University of
- * Minnesota.
- ****************************************************************************/
- void LogMessage(sockfd,message)
- int sockfd; /* The socket file descriptor. */
- char *message; /* The message to log. */
- { static char hostName[256], /* Name of the host we are talking to. */
- ip[256]; /* The hosts IP number. */
- struct flock lock; /* Info to lock the log file. */
- time_t theTime; /* The current time. */
- char *cTime, /* The calendar time. */
- *lineFeed, /* Location of the line feed. */
- logEntry[1024]; /* The entry to place in the log. */
- int theLog = -1; /* The file to log to. */
-
- if (!logFile && !debug) /* No sense is using the CPU. */
- return;
-
- if (logFile) /* Open the sucker. */
- theLog = open(logFile,O_WRONLY | O_APPEND | O_CREAT,0644);
-
- hostName[0] = '\0';
-
- if (sockfd > -1)
- IP2Hostname(sockfd,hostName,ip);
- time(&theTime);
- cTime = ctime(&theTime);
- if (lineFeed = strchr(cTime,'\n')) /* Get rid of the line feed. */
- *lineFeed = '\0';
- sprintf(logEntry,"%s %d %s : %s\n",cTime,getpid(),hostName,message);
-
- if (theLog != -1)
- {
- lock.l_type = F_WRLCK;
- lock.l_whence = SEEK_SET;
- lock.l_start = lock.l_len = 0L;
- fcntl(theLog,F_SETLKW,&lock); /* Lock the file so no one can write to it. */
- lseek(theLog,0L,SEEK_END); /* Make sure we are at the end of file. */
- write(theLog,logEntry,strlen(logEntry));
- lock.l_type = F_UNLCK;
- fcntl(theLog,F_SETLKW,&lock); /* Unlock the file. */
- }
-
- if (logFile) /* I guess we can close it now. */
- close(theLog);
-
- if (debug)
- fprintf(stderr,"%s",logEntry);
-
-
- } /* LogMessage */
-
- /*****************************************************************************
- * GetDisplayString returns the display string portion of the gopher line
- * contained in 'str'. I should mention the item type, which is the first
- * character in the display string, is skipped.
- ****************************************************************************/
- char *GetDisplayString(fp,pos)
- FILE *fp; /* The file we are reading. */
- long *pos; /* Position in the file of the line. */
- { static char buf[2048]; /* Buffer for the display string. */
- size_t len; /* Number of characters to the tab. */
- static long line = 0; /* The line number in the file for error reporting. */
- char *s; /* Pointer to the line we acquire from the file. */
- short error = 0; /* Did we encounter an error? */
-
- if (!(--elements2do % 10))
- fprintf(stderr,"%s%9ld",NINEBACKSPACES,elements2do);
-
- *pos = ftell(fp);
- line++;
-
- if (fgets(buf,2048,fp))
- if (s = strchr(buf,'\t'))
- if ((len = (size_t)(s - buf)) > 0)
- {
- buf[len] = '\0';
- return(buf + 1);
- }
- else
- error = 1;
- else
- error = 1;
-
- if (error)
- {
- fprintf(stderr,"%swarning: GetDisplayString found bad line, line = %ld.\n ",NINEBACKSPACES,line);
- pos = 0;
- return("");
- }
-
- return((char *)NIL);
-
- } /* GetDisplayString */
-
- /*****************************************************************************
- * CreateWordsTree creates a tree where each node of the tree contains a
- * word and a list of file positions, representing the line the word is
- * contained in. It should be mentioned that the words are parsed from
- * the display string according to the
- ****************************************************************************/
- int CreateWordsTree(fileName)
- char *fileName; /* Name of the data file. */
- { long position; /* Position in the file of the current line. */
- FILE *fpIn; /* The data file we area reading from .*/
- char *dStr, /* The display string with no leading item type. */
- *word; /* A word from dStr. */
- int error = 0; /* Did we get an error? */
-
- if (fpIn = fopen(fileName,"r"))
- {
- fprintf(stderr,"Building the words tree...\n");
- fprintf(stderr,"%9ld",elements2do = NumberOfLines(fileName));
-
- while (dStr = GetDisplayString(fpIn,&position))
- if (*dStr)
- for (word = strtok(dStr,DELIMITERS); word; word = strtok(NIL,DELIMITERS))
- BuildTree(&root,StrToLower(word),position);
- fclose(fpIn);
- fprintf(stderr,"%swords tree is now built.\n",NINEBACKSPACES);
- }
- else
- error = fprintf(stderr,"error: CreateWordsTree could not open %s\n",fileName);
-
- return(!error);
-
- } /* CreateWordsTree */
-
- /*****************************************************************************
- * GetPositions returns a list of the file positions, which contains the
- * gopher information to send the client we are talking to, or do a given
- * operation on.
- ****************************************************************************/
- static ListType *GetPositions(index)
- long index;
- { FILE *fp;
- ListType *list = (ListType *)NIL;
- int i,
- numPos;
- long where;
-
- if (debug)
- fprintf(stderr,"In GetPositions with index = %ld, items[%ld].word = [%s], items[%ld].positions = %ld\n",
- index,index,items[index].word,index,items[index].positions->where);
- if (fp = fopen(indexName,"r"))
- {
- if (!fseek(fp,items[index].positions->where,SEEK_SET))
- {
- numPos = GetInt(fp);
- #ifdef BOOLOP_DEBUG
- if (debug)
- fprintf(stderr,"\tnumPos = %d\n",numPos);
- #endif
- for (i = 0; i < numPos; i++)
- {
- where = GetLong(fp);
- #ifdef BOOLOP_DEBUG
- if (debug)
- fprintf(stderr,"\twhere = %ld\n",where);
- #endif
- list = BuildList(list,where);
- }
- }
- else
- fprintf(stderr,"error: GetPositions had fseek fail.\n");
- fclose(fp);
- }
- else
- fprintf(stderr,"error: GetPositions could not open %s for reading\n",indexName);
- return(list);
-
- } /* GetPositions */
-
- /*****************************************************************************
- * WriteHashTables writes the information within the tree pointed to by 'node'
- * to the hash table and the index table.
- ****************************************************************************/
- static void WriteHashTables(node)
- TreeType *node; /* The node to process. */
- { ListType *positions; /* List with the positions. */
-
- if (node)
- {
- WriteHashTables(node->left);
-
- if (!(--elements2do % 10))
- fprintf(stderr,"%s%9ld",NINEBACKSPACES,elements2do);
-
- fprintf(hfp,"%d\t%s\t%ld\n",(int)strlen(node->word) + 1,node->word,(long)ftell(ifp));
-
- fprintf(ifp,"\t%ld",NumberOfListNodes(node->positions));
- for (positions = node->positions; positions; positions = positions->next)
- fprintf(ifp,"\t%ld",positions->where);
- fprintf(ifp,"\n"),++lineNumber;
-
- WriteHashTables(node->right);
- }
-
- } /* WriteHashTables */
-
- /*****************************************************************************
- * MakeHashTables creates the hash and index table such that the hash table
- * contains the data file to index, the number of elements to create, followed
- * by each line containing the size of the word plus 1 for the null character,
- * a tab, the word, tab, and the file position in the the index table. The
- * index table contains the number of file positions, tab, the file positions
- * seperated by tabs on a single line. Each file position in the index table
- * is the position in the data file where the word found in the hash table
- * is referencing.
- ****************************************************************************/
- void MakeHashTables(fileName,root)
- char *fileName; /* Name of the data file.*/
- TreeType *root; /* The root of the tree. */
- {
- fprintf(stderr,"Building the hash tables...\n");
-
- strcpy(indexName,fileName);
- strcat(indexName,INDXEXT);
- strcpy(hashName,fileName);
- strcat(hashName,HASHEXT);
-
- if (!(ifp = fopen(indexName,"w")))
- fprintf(stderr,"error: MakeHashTable could not open %s for writing\n",indexName);
- if (!(hfp = fopen(hashName,"w")))
- fprintf(stderr,"error: MakeHashTable could not open %s for writing\n",hashName);
- if (!ifp || !hfp)
- exit(-1);
-
- fprintf(hfp,"%s\t%ld\n",fileName,elements2do = NumberOfLeafs(root));
- fprintf(stderr," "); /* To support the use of the backspaces. */
- WriteHashTables(root);
-
- fprintf(stderr,"%shash tables are completed.\n",NINEBACKSPACES);
-
- fclose(ifp);
- fclose(hfp);
- ifp = hfp = (FILE *)NIL;
-
- } /* MakeHashTables */
-
- /*****************************************************************************
- * ReservedWord returns true if 'word' is "AND", "OR", or "NOT", otherwise
- * it returns false. If 'word' is "AND" this routine returns 1, if 'word' is
- * "OR" we return 2, if 'word' is "NOT" 3 gets returned.
- ****************************************************************************/
- short ReservedWord(word)
- char *word; /* The word we are checking if reserved. */
- { int len = strlen(word); /* The length of 'word'. */
-
- switch (len)
- {
- case 2:
- if ((word[0] == 'O' || word[0] == 'o') &&
- (word[1] == 'R' || word[1] == 'r'))
- {
- word[0] = 'O'; word[1] = 'R';
- return(OR);
- }
- break;
- case 3:
- if ((word[0] == 'A' || word[0] == 'a') &&
- (word[1] == 'N' || word[1] == 'n') &&
- (word[2] == 'D' || word[2] == 'd'))
- {
- word[0] = 'A'; word[1] = 'N'; word[2] = 'D';
- return(AND);
- }
- else if ((word[0] == 'N' || word[0] == 'n') &&
- (word[1] == 'O' || word[1] == 'o') &&
- (word[2] == 'T' || word[2] == 't'))
- {
- word[0] = 'N'; word[1] = 'O'; word[2] = 'T';
- return(NOT);
- }
- break;
- default:
- break;
- }
-
- return(NOOP);
-
- } /* ReservedWord */
-
- /*****************************************************************************
- * ParseSearchString parses 'string' into words and or commands which we do
- * searches and boolean searches on. All words and commands get placed into
- * the array 'commands'. If any word is "and", "or", or "not" it is taken to
- * be the command AND, OR, or NOT respectivly. If any 2 words are not
- * seperated with a command, the command AND is implied to be the seperating
- * command.
- ****************************************************************************/
- short ParseSearchString(string)
- char *string; /* The string we are parsing. */
- { char *word; /* The word extracted from 'string'. */
- short lastOneReserved,/* Was the last word reserved? */
- thisOneReserved;/* Is the current word reserved? */
-
- #ifdef IN_THE_FUTURE
- /* In the future the database to use will exist before the tab. */
- if (time2process)
- time(&startTime);
-
- if (!CreateElements(indexTable))
- {
- fprintf(stderr,"error: DoSearch could not create index tree.\n");
- exit(-1);
- }
-
- if (time2process)
- PostTime2Process();
- #endif
-
- for (numCommands = 0, lastOneReserved = 1, word = strtok(string,DELIMITERS);
- numCommands < MAXWRDSNCMNDS && word;
- word = strtok(NIL,DELIMITERS))
- {
- thisOneReserved = ReservedWord(word);
- if (!lastOneReserved && !thisOneReserved)
- {
- wrdsNcmds[numCommands++] = DEFAULTBOOLOP;
- wrdsNcmds[numCommands++] = StrToLower(word);
- }
- else if (thisOneReserved)
- wrdsNcmds[numCommands++] = word;
- else
- wrdsNcmds[numCommands++] = StrToLower(word);
- lastOneReserved = thisOneReserved;
- }
-
- return(numCommands);
-
- } /* ParseSearchString */
-
- /*****************************************************************************
- * PrintPositions sends the line of text from the data file, the index table
- * was built from, whose postion in the file is specified by 'node->where'
- * to stdout.
- ****************************************************************************/
- static void PrintPositions(sockfd,node,limit,rangeStart,rangeEnd)
- int sockfd; /* The socket file descriptor. */
- ListType *node; /* The list we are printing. */
- long limit, /* The max number of items to return. */
- rangeStart, /* The start of a range. */
- rangeEnd; /* The end of a range. */
- { FILE *fp; /* Pointer to the file with the actual data. */
- char buf[2048]; /* Buffer for the line of text. */
- long i, /* A loop counter. */
- n; /* The number of items to return. */
-
- /* If too many items create some links on the fly and send'em.*/
- if (!rangeStart && limit == MAXITEMS2RETURN && (n = NumberOfListNodes(node)) > MAXITEMS2RETURN)
- {
- char s[1024], /* A temporary string. */
- what[1024]; /* The search command - any special command. */
-
- /* Handle the "All 'n' items" directory. */
- sprintf(s,"1All %ld items\t?all",n);
- SendString(s);
- for (what[0] = i = 0; i < numCommands; i++)
- {
- strcat(what," ");
- strcat(what,wrdsNcmds[i]);
- }
- SendString(what);
- sprintf(s,"\t%s\t%d\r\n",HOSTNAME,port2log);
- SendString(s);
-
- /* Now handle the range directories. */
- for (i = 0; i < n; i += limit)
- {
- rangeStart = i + 1;
- if ((i + limit) < n)
- rangeEnd = i + limit;
- else
- rangeEnd = n;
- sprintf(s,"1items %ld to %ld\t?range=%ld-%ld %s\t%s\t%d\r\n",
- rangeStart,rangeEnd,rangeStart,rangeEnd,
- what,HOSTNAME,port2log);
- SendString(s);
- }
-
- sprintf(s,"TOO MANY ITEMS: %ld items found",n);
- LogMessage(sockfd,s);
- return;
- }
-
- if (fp = fopen(dataFileName,"r"))
- {
- if (rangeStart)
- {
- /* Traverse node to the rangeStart item. */
- for (i = 0; i < rangeStart - 1 && node; i++, node = node->next);
-
- /* Spit out range of items. */
- for ( ; i < rangeEnd && node; i++, node = node->next)
- if (!fseek(fp,node->where,SEEK_SET))
- SendString(fgets(buf,2048,fp));
- }
- else if (limit < 0)
- while (node)
- {
- if (!fseek(fp,node->where,SEEK_SET))
- SendString(fgets(buf,2048,fp));
- node = node->next;
- }
- else for (n = 0; n < limit && node; n++, node = node->next)
- if (!fseek(fp,node->where,SEEK_SET))
- SendString(fgets(buf,2048,fp));
-
- fclose(fp);
- }
- else
- fprintf(stderr,"error: PrintPositions could not open data file %s\n",dataFileName);
-
- } /* PrintPositions */
-
- #ifdef BOOLOP_DEBUG
- /*****************************************************************************
- * PrintList prints the list 'l', and was written solely for debugging.
- ****************************************************************************/
- static void PrintList(l)
- ListType *l; /* The list to print. */
- {
- if (debug)
- while (l)
- {
- fprintf(stderr,"%10ld,",l->where);
- l = l->next;
- }
-
- } /* PrintList */
- #endif
-
- /*****************************************************************************
- * DoOperation returns a list which is the result of "l1 op l2" where 'l1' and
- * 'l2' are lists, and 'op' is the operation to perform which is either the
- * AND, OR, or NOT operation.
- ****************************************************************************/
- ListType *DoOperation(op,l1,l2)
- short op; /* The operation [AND,OR,NOT]. */
- ListType *l1, /* A list we operate on. */
- *l2; /* The other list we operate on. */
- { ListType *result = (ListType *)NIL,/* The result of "l1 op l2". */
- *t = (ListType *)NIL; /* A pointer into 'l1' or 'l2'. */
-
- switch (op)
- {
- case AND:
- while (l1 && l2)
- if (l1->where == l2->where)
- {
- result = BuildList(result,l1->where);
- l1 = l1->next;
- l2 = l2->next;
- }
- else if (l1->where < l2->where)
- l1 = l1->next;
- else if (l1->where > l2->where)
- l2 = l2->next;
- break;
- case OR:
- while (l1 && l2)
- if (l1->where == l2->where)
- {
- result = BuildList(result,l1->where);
- l1 = l1->next;
- l2 = l2->next;
- }
- else if (l1->where < l2->where)
- {
- result = BuildList(result,l1->where);
- l1 = l1->next;
- }
- else if (l1->where > l2->where)
- {
- result = BuildList(result,l2->where);
- l2 = l2->next;
- }
- if (l1)
- t = l1;
- else if (l2)
- t = l2;
- while (t)
- {
- result = BuildList(result,t->where);
- t = t->next;
- }
- break;
- case NOT:
- while (l1 && l2)
- if (l1->where == l2->where)
- {
- l1 = l1->next;
- l2 = l2->next;
- }
- else if (l1->where < l2->where)
- {
- result = BuildList(result,l1->where);
- l1 = l1->next;
- }
- else if (l1->where > l2->where)
- l2 = l2->next;
- while (l1)
- {
- result = BuildList(result,l1->where);
- l1 = l1->next;
- }
- break;
- default:
- result = (ListType *)NIL;
- break;
- }
-
- return(result);
-
- } /* DoOperation */
-
- /*****************************************************************************
- * LogRequest simply writes the search request to the log file.
- ****************************************************************************/
- void LogRequest(sockfd)
- int sockfd; /* The socket to write to. */
- { char buf[1024];
- short i;
-
- if (!logFile && !debug) /* No sense is using the CPU. */
- return;
-
- sprintf(buf,"jughead(%d) -> ",port2log);
- for (i = 0; i < numCommands; i++)
- if (strlen(buf) + strlen(wrdsNcmds[i]) + 1 < 1024)
- {
- strcat(buf," ");
- strcat(buf,wrdsNcmds[i]);
- }
- else
- break;
- LogMessage(sockfd,buf);
-
- } /* LogRequest */
-
- /*****************************************************************************
- * GetAllPositions acquires all the postitions of the partial word search by
- * finding the start and end position, and then OR'ing these positions into
- * a list which gets returned. If 'what2Find' starts off with the wild card
- * character, the asterik, we post a message to the user and return nil.
- ****************************************************************************/
- static ListType *GetAllPositions(index,what2Find,asterik,asterikPos,sockfd)
- long index; /* Index into the items array. */
- char *what2Find, /* The word we are looking for. */
- *asterik; /* The position of the asterik. */
- size_t asterikPos; /* Number of characters to look at. */
- int sockfd; /* The socket file descriptor, errors only. */
- { ListType *theList = (ListType *)NIL, /* The list to return. */
- *list1 = (ListType *)NIL, /* List of positions to OR against list2. */
- *list2 = (ListType *)NIL; /* List of positions to OR against list1. */
- long start, /* Start of the partial word match. */
- end; /* End of the partial word match. */
-
- if (what2Find == asterik && *what2Find == '*')
- {
- char s[1024];
- sprintf(s,"0Invalid wildcard usage - cannot be the first character.\t\terror.host\t-1\r\n");
- SendString(s);
- sprintf(s,"INVALID WILDCARD USAGE: %s",what2Find);
- LogMessage(sockfd,s);
- return((ListType *)NIL);
- }
-
- /* Find the starting and ending positions of the positions to return. */
- for (start = index - 1; start >= 0 && !strncmp(what2Find,items[start].word,asterikPos); start--);
- for (end = index + 1; end < numElements && !strncmp(what2Find,items[ end ].word,asterikPos); end++);
-
- if (debug)
- fprintf(stderr,"GetAllPositions found starting position = %ld, and ending position = %ld\n",start + 1,end - 1);
-
- /* Process the positions we will be returning. */
- for (start++; start < end; start++)
- if (!list1)
- list1 = GetPositions(start);
- else if (!list2)
- {
- list2 = GetPositions(start);
- theList = DoOperation(OR,list1,list2);
- DestroyList(list1);
- DestroyList(list2);
- list1 = theList;
- list2 = (ListType *)NIL;
- }
-
- /* We may have only got one hit, so make sure we return the information. */
- if (!theList && list1)
- theList = list1;
-
- return(theList);
-
- } /* GetAllPositions */
-
- /*****************************************************************************
- * PostPositions posts the result of doing the boolean operations on 'what2find'
- * and checking for membership in 'array'.
- ****************************************************************************/
- void PostPositions(sockfd,what2find)
- int sockfd; /* The socket to write to. */
- char *what2find; /* String with words and operations. */
- { short evaluate, /* Do we evaluate the operation? */
- i, /* A loop counter. */
- operater = NOOP,/* Either [NOOP,AND,OR,NOT]. */
- reserved; /* Is the current word reserved? */
- long index, /* Index into the items array. */
- rangeStart, /* The start of a range. */
- rangeEnd, /* The end of a range. */
- limit = MAXITEMS2RETURN; /* The max number of items to return. */
- ListType *list1 = (ListType *)NIL, /* List of positions to operate against list2. */
- *list2 = (ListType *)NIL, /* List of positions to operate against list1. */
- *result = (ListType *)NIL, /* The result of "list1 operation list2'. */
- *tList = (ListType *)NIL; /* Temporary list to support partial word searches. */
- char *asterik; /* Is this a partial word search? */
- size_t asterikPos; /* Number of characters to the asterik. */
-
- if (!(what2find = SpecialCommand(what2find,&limit,&rangeStart,&rangeEnd)))
- return;
- if (ParseSearchString(what2find))
- {
- LogRequest(sockfd);
- for (evaluate = i = 0; i < numCommands; i++)
- if (reserved = ReservedWord(wrdsNcmds[i]))
- operater = reserved;
- else if ((index = BinarySearch(StrToLower(wrdsNcmds[i]),items,numElements,&asterik,&asterikPos)) >= 0)
- {
- if (asterik) /* We have a partial word search. */
- tList = GetAllPositions(index,wrdsNcmds[i],asterik,asterikPos,sockfd);
- else
- tList = GetPositions(index);
- if (!list1 && !evaluate)
- {
- result = list1 = tList;
- evaluate = 1;
- }
- else if (!list2)
- {
- list2 = tList;
-
- result = DoOperation(operater,list1,list2);
- #ifdef BOOLOP_DEBUG
- if (debug)
- {
- fprintf(stderr,"list1 ==> ");
- PrintList(list1);
- fprintf(stderr,"\nlist2 ==> ");
- PrintList(list2);
- fprintf(stderr,"\nresult ==> ");
- PrintList(result);
- fprintf(stderr,"\n");
- }
- #endif
- DestroyList(list1);
- DestroyList(list2);
- list1 = result;
- list2 = (ListType *)NIL;
- operater = NOOP;
- }
- }
- else
- {
- char message[256];
- sprintf(message,"COULD NOT FIND [%s]",wrdsNcmds[i]);
- LogMessage(sockfd,message);
- if (operater == AND)
- {
- DestroyList(result);
- result = list1 = (ListType *)NIL;
- }
- evaluate = 1;
- }
-
- if (result)
- {
- PrintPositions(sockfd,result,limit,rangeStart,rangeEnd);
- DestroyList(result);
- }
- }
-
- } /* PostPositions */
-
- /*****************************************************************************
- * VerifyDataBaseName verifies we are dealing with the correct database.
- * This routine returns true if we have the correct database and false
- * otherwise.
- ****************************************************************************/
- static int VerifyDataBaseName(fName,dName)
- char *fName, /* The root name of the database. */
- *dName; /* The name we should be dealing with. */
- { char *str; /* Position in fName where dName occurs. */
-
- if (str = strstr(fName,dName))
- {
- if (!strcmp(dName,str))
- strcpy(dName,fName);
- return(1);
- }
- return(0);
-
- } /* VerifyDataBaseName */
-
- /*****************************************************************************
- * CreateElements returns true if we could create the dynamic array 'items'
- * and false othewise.
- ****************************************************************************/
- short CreateElements(fileName)
- char *fileName; /* The name of the file to read. */
- { FILE *fp; /* Pointer to the file we are reading. */
- long l, /* A loop counter. */
- where; /* The postions in the "data" file. */
- short strLen; /* The size of the word. */
-
- strcpy(indexName,fileName);
- strcat(indexName,INDXEXT);
- strcpy(hashName,fileName);
- strcat(hashName,HASHEXT);
-
- if (!(fp = fopen(fileName,"r")))
- fprintf(stderr,"error: CreateElements could not open %s for reading\n",fileName);
- if (!(hfp = fopen(hashName,"r")))
- fprintf(stderr,"error: CreateElements could not open %s for reading\n",hashName);
- if (!fp || !hfp)
- return(0);
-
- (void)GetStr(hfp,dataFileName,512);
-
- if (!VerifyDataBaseName(fileName,dataFileName))
- {
- fprintf(stderr,"error: incompatible database, looking for [%s]\n",dataFileName);
- exit(-1);
- }
-
- numElements = GetLong(hfp);
- if (items = (Element *)malloc(numElements * sizeof(Element)))
- for (l = 0; l < numElements; l++)
- {
- strLen = (short)GetInt(hfp);
- if (items[l].word = (char *)malloc(strLen * sizeof(char)))
- {
- (void)GetStr(hfp,items[l].word,strLen);
- where = GetLong(hfp);
- items[l].positions = (ListType *)NIL;
- items[l].positions = BuildList(items[l].positions,where);
- }
- else
- {
- fprintf(stderr,"error: CreateElements could not get memory for string %ld\n",l);
- exit(-1);
- }
- }
- else
- {
- fprintf(stderr,"error: CreateElements could not get memory for the %ld items\n",numElements);
- exit(-1);
- }
-
- #ifdef SHOW_ITEMS_DURING_DEBUG
- if (debug)
- {
- fprintf(stderr,"items looks like:\n");
- for (l = 0; l < numElements; l++)
- fprintf(stderr,"\t[%ld]\t[%s]\t%ld\n",l,items[l].word,items[l].positions->where);
- }
- #endif
-
- fclose(fp);
- fclose(hfp);
- hfp = (FILE *)NIL;
- return(1);
-
- } /* CreateElements */
-
- /*****************************************************************************
- * CleanUp simply frees up any memory used in the dynamic array 'items'.
- ****************************************************************************/
- static void CleanUp()
- { long i; /* A loop counter. */
-
- for (i = 0; i < numElements; i++)
- {
- free(items[i].word);
- DestroyList(items[i].positions);
- }
- free(items);
- numElements = 0;
-
- } /* CleanUp */
-
- /*****************************************************************************
- * HangUpSignal resets the hangup interrupt, and returns to the saved state.
- ****************************************************************************/
- int HangUpSignal()
- {
- signal(SIGHUP,HangUpSignal); /* Set things for next time. */
-
- LogMessage(-1,"SIGHUP signal encountered");
- if (debug)
- fprintf(stderr,"Releasing memory ...");
-
- CleanUp();
- if (debug)
- fprintf(stderr,"\nRebuilding the binary tree.\n");
-
- longjmp(hupbuf,0); /* Jump to the saved state. */
-
- } /* HangUpSignal */
-
- /*****************************************************************************
- * DoSearch is the search server part of jughead. This routine never
- * returns.
- ****************************************************************************/
- void DoSearch(indexTable,logFile,searchPort)
- char *indexTable, /* Name of the index table file. */
- *logFile; /* The file to log to. */
- int searchPort; /* Port to use */
- { int childspid, /* The process ID of the child process. */
- s, /* The socket file descriptor. */
- newS, /* The new socket file descriptor. */
- addressLen; /* Size of the address space. */
- struct sockaddr_in address; /* Address of connecting entity. */
-
- fprintf(stderr,"jughead, Copyright 1993, University of Utah Computer Center\n");
- fprintf(stderr,"Using index table %s\n",indexTable);
- fprintf(stderr,"Using port %d\n",port2log = searchPort);
-
- if (userName)
- {
- struct passwd *pswd;
- if (pswd = getpwnam(userName))
- if (setuid(pswd->pw_uid))
- {
- fprintf(stderr,"error: could not setuid %ld (%s)\n",(long)pswd->pw_uid,userName);
- exit(-1);
- }
- else
- fprintf(stderr,"Running with setuid %ld (%s)\n",(long)pswd->pw_uid,userName);
- else
- {
- fprintf(stderr,"error: could not find user %s\n",userName);
- exit(-1);
- }
- }
-
-
- if (logFile)
- {
- char message[512];
- fprintf(stderr,"Logging to %s\n",logFile);
- strcpy(message,"STARTED UP ON ");
- strcat(message,indexTable);
- LogMessage(-1,message);
- }
-
- if ((s = ListenerEstablished(searchPort)) < 0)
- exit(-1);
-
- /* Set things up to handle the -HUP signal. */
- if (signal(SIGHUP,SIG_IGN) != SIG_IGN)
- signal(SIGHUP,HangUpSignal);
- setjmp(hupbuf);
-
- #ifndef IN_THE_FUTURE
- if (time2process)
- time(&startTime);
-
- if (!CreateElements(indexTable))
- {
- fprintf(stderr,"error: DoSearch could not create index tree.\n");
- exit(-1);
- }
-
- if (time2process)
- PostTime2Process();
- #endif
-
- if (debug)
- fprintf(stderr,"Ready for incoming connections.\n");
-
- while (1) /* Wait for and handle all connections. */
- {
- addressLen = sizeof(address);
- if ((newS = accept(s,(struct sockaddr *)&address,&addressLen)) < 0)
- {
- fprintf(stderr,"error: DoSearch could not accept\n");
- exit(-1);
- }
- if ((childspid = fork()) < 0)
- {
- fprintf(stderr,"error: DoSearch could not fork\n");
- exit(-1);
- }
- else if (!childspid) /* Child process so */
- { /* close original socket */
- close(s);
- ProcessRequest(newS);
- #ifdef IN_THE_FUTURE
- CleanUp();
- #endif
- exit(0);
- }
-
- ChildProcess(); /* Wait for the childs process to die. */
- close(newS); /* Close the new socket we opened. */
- }
-
- } /* DoSearch */
-